181 research outputs found
Fibonacci Binning
This note argues that when dot-plotting distributions typically found in
papers about web and social networks (degree distributions, component-size
distributions, etc.), and more generally distributions that have high
variability in their tail, an exponentially binned version should always be
plotted, too, and suggests Fibonacci binning as a visually appealing,
easy-to-use and practical choice
Broadword Implementation of Parenthesis Queries
We continue the line of research started in "Broadword Implementation of
Rank/Select Queries" proposing broadword (a.k.a. SWAR, "SIMD Within A
Register") algorithms for finding matching closed parentheses and the k-th far
closed parenthesis. Our algorithms work in time O(log w) on a word of w bits,
and contain no branch and no test instruction. On 64-bit (and wider)
architectures, these algorithms make it possible to avoid costly tabulations,
while providing a very significant speedup with respect to for-loop
implementations
Supremum-Norm Convergence for Step-Asynchronous Successive Overrelaxation on M-matrices
Step-asynchronous successive overrelaxation updates the values contained in a
single vector using the usual Gau\ss-Seidel-like weighted rule, but arbitrarily
mixing old and new values, the only constraint being temporal coherence: you
cannot use a value before it has been computed. We show that given a
nonnegative real matrix , a and a vector such that , every iteration of
step-asynchronous successive overrelaxation for the problem , with , reduces geometrically the -norm of the current error by a factor that we can compute explicitly. Then,
we show that given a it is in principle always possible to
compute such a . This property makes it possible to estimate the
supremum norm of the absolute error at each iteration without any additional
hypothesis on , even when is so large that computing the product
is feasible, but estimating the supremum norm of
is not
Stanford Matrix Considered Harmful
This note argues about the validity of web-graph data used in the literature
An experimental exploration of Marsaglia's xorshift generators, scrambled
Marsaglia proposed recently xorshift generators as a class of very fast,
good-quality pseudorandom number generators. Subsequent analysis by Panneton
and L'Ecuyer has lowered the expectations raised by Marsaglia's paper, showing
several weaknesses of such generators, verified experimentally using the
TestU01 suite. Nonetheless, many of the weaknesses of xorshift generators fade
away if their result is scrambled by a non-linear operation (as originally
suggested by Marsaglia). In this paper we explore the space of possible
generators obtained by multiplying the result of a xorshift generator by a
suitable constant. We sample generators at 100 equispaced points of their state
space and obtain detailed statistics that lead us to choices of parameters that
improve on the current ones. We then explore for the first time the space of
high-dimensional xorshift generators, following another suggestion in
Marsaglia's paper, finding choices of parameters providing periods of length
and . The resulting generators are of extremely
high quality, faster than current similar alternatives, and generate
long-period sequences passing strong statistical tests using only eight logical
operations, one addition and one multiplication by a constant
Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics
Minimal-interval semantics associates with each query over a document a set
of intervals, called witnesses, that are incomparable with respect to inclusion
(i.e., they form an antichain): witnesses define the minimal regions of the
document satisfying the query. Minimal-interval semantics makes it easy to
define and compute several sophisticated proximity operators, provides snippets
for user presentation, and can be used to rank documents. In this paper we
provide algorithms for computing conjunction and disjunction that are linear in
the number of intervals and logarithmic in the number of operands; for
additional operators, such as ordered conjunction and Brouwerian difference, we
provide linear algorithms. In all cases, space is linear in the number of
operands. More importantly, we define a formal notion of optimal laziness, and
either prove it, or prove its impossibility, for each algorithm. We cast our
results in a general framework of antichains of intervals on total orders,
making our algorithms directly applicable to other domains.Comment: 24 pages, 4 figures. A preliminary (now outdated) version was
presented at SPIRE 200
Four Degrees of Separation, Really
We recently measured the average distance of users in the Facebook graph,
spurring comments in the scientific community as well as in the general press
("Four Degrees of Separation"). A number of interesting criticisms have been
made about the meaningfulness, methods and consequences of the experiment we
performed. In this paper we want to discuss some methodological aspects that we
deem important to underline in the form of answers to the questions we have
read in newspapers, magazines, blogs, or heard from colleagues. We indulge in
some reflections on the actual meaning of "average distance" and make a number
of side observations showing that, yes, 3.74 "degrees of separation" are really
few
- …